Chinese Treebanks and Grammar Extraction

نویسندگان

  • Keh-Jiann Chen
  • Yu-Ming Hsieh
چکیده

Preparation of knowledge bank is a very difficult task. In this paper, we discuss the knowledge extraction from the manually examined Sinica Treebank. Categorical information, word-to-word relation, word collocations, new syntactic patterns and sentence structures are obtained. A searching system for Chinese sentence structure was developed in this study. By using pre-extracted data and SQL commands, the system replies the user's queries efficiently. We also analyze the extracted grammars to study the tradeoffs between the granularity of the grammar rules and their coverage as well as ambiguities. It provides the information of knowing how large a treebank is sufficient for the purpose of grammar extraction. Finally, we also analyze the tradeoffs between grammar coverage and ambiguity by parsing results from the grammar rules of different granularity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Heterogeneous Treebanks for Parsing

We address the issue of using heterogeneous treebanks for parsing by breaking it down into two sub-problems, converting grammar formalisms of the treebanks to the same one, and parsing on these homogeneous treebanks. First we propose to employ an iteratively trained target grammar parser to perform grammar formalism conversion, eliminating predefined heuristic rules as required in previous meth...

متن کامل

Grammar Extraction from Treebanks for Hindi and Telugu

Grammars play an important role in many Natural Language Processing (NLP) applications. The traditional approach to creating grammars manually, besides being labor-intensive, has several limitations. With the availability of large scale syntactically annotated treebanks, it is now possible to automatically extract an approximate grammar of a language in any of the existing formalisms from a cor...

متن کامل

Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach ca...

متن کامل

Comparing Lexicalized Treebank Grammars Extracted From Chinese, Korean, And English Corpora

In this paper, we present a method for comparing Lexicalized Tree Adjoining Grammars extracted from annotated corpora for three languages: English, Chinese and Korean. This method makes it possible to do a quantitative comparison between the syntactic structures of each language, thereby providing a way of testing the Universal Grammar Hypothesis, the foundation of modern linguistic theories. 1...

متن کامل

A Semantics Oriented Grammar for Chinese Treebanking

Chinese grammar engineering has been a much debated task. Whilst semantic information has been reconed crucial for Chinese syntactic analysis and downstream applications, existing Chinese treebanks lack a consistent and strict sentential semantic formalism. In this paper, we introduce a semantics oriented grammar for Chinese, designed to provide basic supports for tasks such as automatic semant...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004